Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 261
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38607721

RESUMO

N4-acetylcytidine (ac4C) is a post-transcriptional modification in mRNA that is critical in mRNA translation in terms of stability and regulation. In the past few years, numerous approaches employing convolutional neural networks (CNN) and Transformer have been proposed for the identification of ac4C sites, with each variety of approaches processing distinct characteristics. CNN-based methods excels at extracting local features and positional information, whereas Transformer-based ones stands out in establishing long-range dependencies and generating global representations. Given the importance of both local and global features in mRNA ac4C sites identification, we propose a novel method termed TransC-ac4C which combines CNN and Transformer together for enhancing the feature extraction capability and improving the identification accuracy. Five different feature encoding strategies (One-hot, NCP, ND, EIIP, and K-mer) are employed to generate the mRNA sequence representations, in which way the sequence attributes and physical and chemical properties of the sequences can be embedded. To strengthen the relevance of features, we construct a novel feature fusion method. Firstly, the CNN is employed to process five single features, stitch them together and feed them to the Transformer layer. Then, our approach employs CNN to extract local features and Transformer subsequently to establish global long-range dependencies among extracted features. We use 5-fold cross-validation to evaluate the model, and the evaluation indicators are significantly improved. The prediction accuracy of the two datasets is as high as 81.42.

2.
Mol Oncol ; 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38627210

RESUMO

Different molecular classifications for gastric cancer (GC) have been proposed based on multi-omics platforms with the long-term goal of improved precision treatment. However, the GC (phospho)proteome remains incompletely characterized, particularly at the level of tyrosine phosphorylation. In addition, previous multiomics-based stratification of patient cohorts has lacked identification of corresponding cell line models and comprehensive validation of broad or subgroup-selective therapeutic targets. To address these knowledge gaps, we applied a reverse approach, undertaking the most comprehensive (phospho)proteomic analysis of GC cell lines to date and cross-validating this using publicly available data. Mass spectrometry (MS)-based (phospho)proteomic and tyrosine phosphorylation datasets were subjected to individual or integrated clustering to identify subgroups that were subsequently characterized in terms of enriched molecular processes and pathways. Significant congruence was detected between cell line proteomic and specific patient-derived transcriptomic subclassifications. Many protein kinases exhibiting 'outlier' expression or phosphorylation in the cell line dataset exhibited genomic aberrations in patient samples and association with poor prognosis, with casein kinase I isoform delta/epsilon (CSNK1D/E) being experimentally validated as potential therapeutic targets. Src family kinases were predicted to be commonly hyperactivated in GC cell lines, consistent with broad sensitivity to the next-generation Src inhibitor eCF506. In addition, phosphoproteomic and integrative clustering segregated the cell lines into two subtypes, with epithelial-mesenchyme transition (EMT) and proliferation-associated processes enriched in one, designated the EMT subtype, and metabolic pathways, cell-cell junctions, and the immune response dominating the features of the other, designated the metabolism subtype. Application of kinase activity prediction algorithms and interrogation of gene dependency and drug sensitivity databases predicted that the mechanistic target of rapamycin kinase (mTOR) and dual specificity mitogen-activated protein kinase kinase 2 (MAP2K2) represented potential therapeutic targets for the EMT and metabolism subtypes, respectively, and this was confirmed using selective inhibitors. Overall, our study provides novel, in-depth insights into GC proteomics, kinomics, and molecular taxonomy and reveals potential therapeutic targets that could provide the basis for precision treatments.

3.
Bioinform Adv ; 4(1): vbae035, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38549946

RESUMO

Motivation: PE/PPE proteins, highly abundant in the Mycobacterium genome, play a vital role in virulence and immune modulation. Understanding their functions is key to comprehending the internal mechanisms of Mycobacterium. However, a lack of dedicated resources has limited research into PE/PPE proteins. Results: Addressing this gap, we introduce MycobactERIal PE/PPE proTeinS (MERITS), a comprehensive 3D structure database specifically designed for PE/PPE proteins. MERITS hosts 22 353 non-redundant PE/PPE proteins, encompassing details like physicochemical properties, subcellular localization, post-translational modification sites, protein functions, and measures of antigenicity, toxicity, and allergenicity. MERITS also includes data on their secondary and tertiary structure, along with other relevant biological information. MERITS is designed to be user-friendly, offering interactive search and data browsing features to aid researchers in exploring the potential functions of PE/PPE proteins. MERITS is expected to become a crucial resource in the field, aiding in developing new diagnostics and vaccines by elucidating the sequence-structure-functional relationships of PE/PPE proteins. Availability and implementation: MERITS is freely accessible at http://merits.unimelb-biotools.cloud.edu.au/.

4.
Comput Biol Med ; 173: 108339, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38547658

RESUMO

The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.


Assuntos
Inteligência Artificial , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos , Redes Neurais de Computação , Benchmarking
5.
Environ Pollut ; 348: 123852, 2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38531468

RESUMO

Model-estimated air pollution exposure assessments have been extensively employed in the evaluation of health risks associated with air pollution. However, few studies synthetically evaluate the reliability of model-estimated PM2.5 products in health risk assessment by comparing them with ground-based monitoring station air quality data. In response to this gap, we undertook a meticulously structured systematic review and meta-analysis. Our objective was to aggregate existing comparative studies to ascertain the disparity in mortality effect estimates derived from model-estimated ambient PM2.5 exposure versus those based on monitoring station-observed PM2.5 exposure. We conducted searches across multiple databases, namely PubMed, Scopus, and Web of Science, using predefined keywords. Ultimately, ten studies were included in the review. Of these, seven investigated long-term annual exposure, while the remaining three studies focused on short-term daily PM2.5 exposure. Despite variances in the estimated Exposure-Response (E-R) associations, most studies revealed positive associations between ambient PM2.5 exposure and all-cause and cardiovascular mortality, irrespective of the exposure being estimated through models or observed at monitoring stations. Our meta-analysis revealed that all-cause mortality risk associated with model-estimated PM2.5 exposure was in line with that derived from station-observed sources. The pooled Relative Risk (RR) was 1.083 (95% CI: 1.047, 1.119) for model-estimated exposure, and 1.089 (95% CI: 1.054, 1.125) for station-observed sources (p = 0.795). In conclusion, most model-estimated air pollution products have demonstrated consistency in estimating mortality risk compared to data from monitoring stations. However, only a limited number of studies have undertaken such comparative analyses, underscoring the necessity for more comprehensive investigations to validate the reliability of these model-estimated exposure in mortality risk assessment.


Assuntos
Poluentes Atmosféricos , Poluição do Ar , Poluentes Atmosféricos/toxicidade , Poluentes Atmosféricos/análise , Material Particulado/análise , Exposição Ambiental/efeitos adversos , Exposição Ambiental/análise , Reprodutibilidade dos Testes , Poluição do Ar/análise , Medição de Risco
6.
Lancet Planet Health ; 8(3): e146-e155, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38453380

RESUMO

BACKGROUND: The acute health effects of short-term (hours to days) exposure to fine particulate matter (PM2·5) have been well documented; however, the global mortality burden attributable to this exposure has not been estimated. We aimed to estimate the global, regional, and urban mortality burden associated with short-term exposure to PM2·5 and the spatiotemporal variations in this burden from 2000 to 2019. METHODS: We combined estimated global daily PM2·5 concentrations, annual population counts, country-level mortality rates, and epidemiologically derived exposure-response functions to estimate the mortality attributable to short-term PM2·5 exposure from 2000 to 2019, in the continental regions and in 13 189 urban centres worldwide at a spatial resolution of 0·1°â€ˆ× 0·1°. We tested the robustness of our mortality estimates with different theoretical minimum risk exposure levels, lag effects, and exposure-response functions. FINDINGS: Approximately 1 million (95% CI 690 000-1·3 million) premature deaths per year from 2000 to 2019 were attributable to short-term PM2·5 exposure, representing 2·08% (1·41-2·75) of total global deaths or 17 (11-22) premature deaths per 100 000 population. Annually, 0·23 million (0·15 million-0·30 million) deaths attributable to short-term PM2·5 exposure were in urban areas, constituting 22·74% of the total global deaths attributable to this cause and accounting for 2·30% (1·56-3·05) of total global deaths in urban areas. The sensitivity analyses showed that our worldwide estimates of mortality attributed to short-term PM2·5 exposure were robust. INTERPRETATION: Short-term exposure to PM2·5 contributes a substantial global mortality burden, particularly in Asia and Africa, as well as in global urban areas. Our results highlight the importance of mitigation strategies to reduce short-term exposure to air pollution and its adverse effects on human health. FUNDING: Australian Research Council and the Australian National Health and Medical Research Council.


Assuntos
Poluição do Ar , Material Particulado , Humanos , Material Particulado/análise , Austrália , Poluição do Ar/efeitos adversos , Poluição do Ar/análise , Mortalidade Prematura , Ásia
7.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38552307

RESUMO

MOTIVATION: Cell-type clustering is a crucial first step for single-cell RNA-seq data analysis. However, existing clustering methods often provide different results on cluster assignments with respect to their own data pre-processing, choice of distance metrics, and strategies of feature extraction, thereby limiting their practical applications. RESULTS: We propose Cross-Tabulation Ensemble Clustering (CTEC) method that formulates two re-clustering strategies (distribution- and outlier-based) via cross-tabulation. Benchmarking experiments on five scRNA-Seq datasets illustrate that the proposed CTEC method offers significant improvements over the individual clustering methods. Moreover, CTEC-DB outperforms the state-of-the-art ensemble methods for single-cell data clustering, with 45.4% and 17.1% improvement over the single-cell aggregated from ensemble clustering method (SAFE) and the single-cell aggregated clustering via Mixture model ensemble method (SAME), respectively, on the two-method ensemble test. AVAILABILITY AND IMPLEMENTATION: The source code of the benchmark in this work is available at the GitHub repository https://github.com/LWCHN/CTEC.git.


Assuntos
Algoritmos , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Análise de Dados , Perfilação da Expressão Gênica/métodos
8.
J Chem Inf Model ; 64(4): 1407-1418, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38334115

RESUMO

Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.


Assuntos
Algoritmos , Proteínas , Humanos , Proteínas/química , Sequência de Aminoácidos , Redes Neurais de Computação , Aminoácidos
9.
BMC Bioinformatics ; 25(1): 13, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38195423

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations. RESULTS: In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM. CONCLUSIONS: Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.


Assuntos
Aprendizado Profundo , MicroRNAs , Humanos , Benchmarking , Aprendizado de Máquina , Nucleotídeos
10.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38261340

RESUMO

The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.


Assuntos
Benchmarking , Redes Reguladoras de Genes , Área Sob a Curva , Aprendizagem , Redes Neurais de Computação
11.
Artigo em Inglês | MEDLINE | ID: mdl-38190667

RESUMO

Origins of replication sites (ORIs) are crucial genomic regions where DNA replication initiation takes place, playing pivotal roles in fundamental biological processes like cell division, gene expression regulation, and DNA integrity. Accurate identification of ORIs is essential for comprehending cell replication, gene expression, and mutation-related diseases. However, experimental approaches for ORI identification are often expensive and time-consuming, leading to the growing popularity of computational methods. In this study, we present PLANNER (DeeP LeArNiNg prEdictor for ORI), a novel approach for species-specific and cell-specific prediction of eukaryotic ORIs. PLANNER uses the multi-scale ktuple sequences as input and employs the DNABERT pre-training model with transfer learning and ensemble learning strategies to train accurate predictive models. Extensive empirical test results demonstrate that PLANNER achieved superior predictive performance compared to state-of-the-art approaches, including iOri-Euk, Stack-ORI, and ORI-Deep, within specific cell types and across different cell types. Furthermore, by incorporating an interpretable analysis mechanism, we provide insights into the learned patterns, facilitating the mapping from discovering important sequential determinants to comprehensively analysing their biological functions. To facilitate the widespread utilisation of PLANNER, we developed an online webserver and local stand-alone software, available at http://planner.unimelb-biotools.cloud.edu.au/ and https://github.com/CongWang3/PLANNER, respectively.

12.
Nucleic Acids Res ; 52(D1): D732-D737, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37870467

RESUMO

ICEberg 3.0 (https://tool2-mml.sjtu.edu.cn/ICEberg3/) is an upgraded database that provides comprehensive insights into bacterial integrative and conjugative elements (ICEs). In comparison to the previous version, three key enhancements were introduced: First, through text mining and manual curation, it now encompasses details of 2065 ICEs, 607 IMEs and 275 CIMEs, including 430 with experimental support. Secondly, ICEberg 3.0 systematically categorizes cargo gene functions of ICEs into six groups based on literature curation and predictive analysis, providing a profound understanding of ICEs'diverse biological traits. The cargo gene prediction pipeline is integrated into the online tool ICEfinder 2.0. Finally, ICEberg 3.0 aids the analysis and exploration of ICEs from the human microbiome. Extracted and manually curated from 2405 distinct human microbiome samples, the database comprises 1386 putative ICEs, offering insights into the complex dynamics of Bacteria-ICE-Cargo networks within the human microbiome. With the recent updates, ICEberg 3.0 enhances its capability to unravel the intricacies of ICE biology, particularly in the characterization and understanding of cargo gene functions and ICE interactions within the microbiome. This enhancement may facilitate the investigation of the dynamic landscape of ICE biology and its implications for microbial communities.


Assuntos
Bactérias , Conjugação Genética , Bases de Dados Genéticas , Humanos , Bactérias/genética , Bases de Dados Factuais , Elementos de DNA Transponíveis , Microbiota
13.
Nucleic Acids Res ; 52(D1): D784-D790, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37897352

RESUMO

TADB 3.0 (https://bioinfo-mml.sjtu.edu.cn/TADB3/) is an updated database that provides comprehensive information on bacterial types I to VIII toxin-antitoxin (TA) loci. Compared with the previous version, three major improvements are introduced: First, with the aid of text mining and manual curation, it records the details of 536 TA loci with experimental support, including 102, 403, 8, 14, 1, 1, 3 and 4 TA loci of types I to VIII, respectively; Second, by leveraging the upgraded TA prediction tool TAfinder 2.0 with a stringent strategy, TADB 3.0 collects 211 697 putative types I to VIII TA loci predicted in 34 789 completely sequenced prokaryotic genomes, providing researchers with a large-scale dataset for further follow-up analysis and characterization; Third, based on their genomic locations, relationships of 69 019 TA loci and 60 898 mobile genetic elements (MGEs) are visualized by interactive networks accessible through the user-friendly web page. With the recent updates, TADB 3.0 may provide improved in silico support for comprehending the biological roles of TA pairs in prokaryotes and their functional associations with MGEs.


Assuntos
Proteínas de Bactérias , Bases de Dados Genéticas , Sequências Repetitivas Dispersas , Sistemas Toxina-Antitoxina , Proteínas de Bactérias/genética , Genoma Bacteriano , Sistemas Toxina-Antitoxina/genética , Loci Gênicos
14.
Curr Environ Health Rep ; 11(1): 46-60, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38038861

RESUMO

PURPOSE OF REVIEW: Wildfire smoke is associated with human health, becoming an increasing public health concern. However, a comprehensive synthesis of the current evidence on the health impacts of ambient wildfire smoke on children and adolescents, an exceptionally vulnerable population, is lacking. We conduct a systematic review of peer-reviewed epidemiological studies on the association between wildfire smoke and health of children and adolescents. RECENT FINDINGS: We searched for studies available in MEDLINE, EMBASE, and Scopus from database inception up to October 11, 2022. Of 4926 studies initially identified, 59 studies from 14 countries were ultimately eligible. Over 33.3% of the studies were conducted in the USA, and two focused on multi-countries. The exposure assessment of wildfire smoke was heterogenous, with wildfire-specific particulate matters with diameters ≤ 2.5 µm (PM2.5, 22.0%) and all-source (22.0%) PM2.5 during wildfire period most frequently used. Over half of studies (50.6%) focused on respiratory-related morbidities/mortalities. Wildfire smoke exposure was consistently associated with enhanced risks of adverse health outcomes in children/adolescents. Meta-analysis results presented a pooled relative risk (RR) of 1.04 (95% confidence interval [CI], 0.96-1.12) for all-cause respiratory morbidity, 1.11 (95% Ci: 0.93-1.32) for asthma, 0.93 (95% CI, 0.85-1.03) for bronchitis, and 1.13 (95% CI, 1.05-1.23) for upper respiratory infection, whilst - 21.71 g for birth weight (95% CI, - 32.92 to - 10.50) per 10 µg/m3 increment in wildfire-specific PM2.5/all-source PM2.5 during wildfire event. The majority of studies found that wildfire smoke was associated with multiple adverse health outcomes among children and adolescents, with respiratory morbidities of significant concern. In-utero exposure to wildfire smoke may increase the risk of adverse birth outcomes and have long-term impacts on height. Higher maternal baseline exposure to wildfire smoke and poor family-level baseline birthweight respectively elevated risks in preterm birth and low birth weight associated with wildfire smoke. More studies in low- and middle-income countries and focusing on extremely young children are needed. Despite technological progress, wildfire smoke exposure measurements remain uncertain, demanding improved methodologies to have more precise assessment of wildfire smoke levels and thus quantify the corresponding health impacts and guide public mitigation actions.


Assuntos
Asma , Nascimento Prematuro , Incêndios Florestais , Recém-Nascido , Criança , Feminino , Humanos , Adolescente , Pré-Escolar , Fumaça/efeitos adversos , Material Particulado/efeitos adversos , Peso ao Nascer
15.
Nucleic Acids Res ; 52(D1): D562-D571, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953313

RESUMO

The single-cell proteomics enables the direct quantification of protein abundance at the single-cell resolution, providing valuable insights into cellular phenotypes beyond what can be inferred from transcriptome analysis alone. However, insufficient large-scale integrated databases hinder researchers from accessing and exploring single-cell proteomics, impeding the advancement of this field. To fill this deficiency, we present a comprehensive database, namely Single-cell Proteomic DataBase (SPDB, https://scproteomicsdb.com/), for general single-cell proteomic data, including antibody-based or mass spectrometry-based single-cell proteomics. Equipped with standardized data process and a user-friendly web interface, SPDB provides unified data formats for convenient interaction with downstream analysis, and offers not only dataset-level but also protein-level data search and exploration capabilities. To enable detailed exhibition of single-cell proteomic data, SPDB also provides a module for visualizing data from the perspectives of cell metadata or protein features. The current version of SPDB encompasses 133 antibody-based single-cell proteomic datasets involving more than 300 million cells and over 800 marker/surface proteins, and 10 mass spectrometry-based single-cell proteomic datasets involving more than 4000 cells and over 7000 proteins. Overall, SPDB is envisioned to be explored as a useful resource that will facilitate the wider research communities by providing detailed insights into proteomics from the single-cell perspective.


Assuntos
Proteínas , Proteômica , Anticorpos , Bases de Conhecimento , Espectrometria de Massas , Humanos , Animais , Análise de Célula Única
16.
IEEE J Biomed Health Inform ; 28(2): 1134-1143, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37963003

RESUMO

Cancer is one of the most challenging health problems worldwide. Accurate cancer survival prediction is vital for clinical decision making. Many deep learning methods have been proposed to understand the association between patients' genomic features and survival time. In most cases, the gene expression matrix is fed directly to the deep learning model. However, this approach completely ignores the interactions between biomolecules, and the resulting models can only learn the expression levels of genes to predict patient survival. In essence, the interaction between biomolecules is the key to determining the direction and function of biological processes. Proteins are the building blocks and principal undertakings of life activities, and as such, their complex interaction network is potentially informative for deep learning methods. Therefore, a more reliable approach is to have the neural network learn both gene expression data and protein interaction networks. We propose a new computational approach, termed CRESCENT, which is a protein-protein interaction (PPI) prior knowledge graph-based convolutional neural network (GCN) to improve cancer survival prediction. CRESCENT relies on the gene expression networks rather than gene expression levels to predict patient survival. The performance of CRESCENT is evaluated on a large-scale pan-cancer dataset consisting of 5991 patients from 16 different types of cancers. Extensive benchmarking experiments demonstrate that our proposed method is competitive in terms of the evaluation metric of the time-dependent concordance index( Ctd) when compared with several existing state-of-the-art approaches. Experiments also show that incorporating the network structure between genomic features effectively improves cancer survival prediction.


Assuntos
Neoplasias , Mapas de Interação de Proteínas , Humanos , Mapas de Interação de Proteínas/genética , Algoritmos , Redes Neurais de Computação , Genômica , Neoplasias/genética
17.
Comput Biol Med ; 168: 107681, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37992470

RESUMO

The multidrug-resistant Gram-negative bacteria has evolved into a worldwide threat to human health; over recent decades, polymyxins have re-emerged in clinical practice due to their high activity against multidrug-resistant bacteria. Nevertheless, the nephrotoxicity and neurotoxicity of polymyxins seriously hinder their practical use in the clinic. Based on the quantitative structure-activity relationship (QSAR), analogue design is an efficient strategy for discovering biologically active compounds with fewer adverse effects. To accelerate the polymyxin analogues discovery process and find the polymyxin analogues with high antimicrobial activity against Gram-negative bacteria, here we developed PmxPred, a GCN and catBoost-based machine learning framework. The RDKit descriptors were used for the molecule and residues representation, and the ensemble learning model was utilized for the antimicrobial activity prediction. This framework was trained and evaluated on multiple Gram-negative bacteria datasets, including Acinetobacter baumannii, Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa and a general Gram-negative bacteria dataset achieving an AUROC of 0.857, 0.880, 0.756, 0.895 and 0.865 on the independent test, respectively. PmxPred outperformed the transfer learning method that trained on 10 million molecules. We interpreted our model well-trained model by analysing the importance of global and residue features. Overall, PmxPred provides a powerful additional tool for predicting active polymyxin analogues, and holds the potential elucidate the mechanisms underlying the antimicrobial activity of polymyxins. The source code is publicly available on GitHub (https://github.com/yanwu20/PmxPred).


Assuntos
Infecções por Bactérias Gram-Negativas , Polimixinas , Humanos , Polimixinas/farmacologia , Polimixinas/química , Antibacterianos/química , Infecções por Bactérias Gram-Negativas/tratamento farmacológico , Infecções por Bactérias Gram-Negativas/microbiologia , Bactérias Gram-Negativas , Farmacorresistência Bacteriana Múltipla , Escherichia coli , Testes de Sensibilidade Microbiana
18.
Small ; 20(6): e2305052, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37798622

RESUMO

The rapid increase and spread of Gram-negative bacteria resistant to many or all existing treatments threaten a return to the preantibiotic era. The presence of bacterial polysaccharides that impede the penetration of many antimicrobials and protect them from the innate immune system contributes to resistance and pathogenicity. No currently approved antibiotics target the polysaccharide regions of microbes. Here, describe monolaurin-based niosomes, the first lipid nanoparticles that can eliminate bacterial polysaccharides from hypervirulent Klebsiella pneumoniae, are described. Their combination with polymyxin B shows no cytotoxicity in vitro and is highly effective in combating K. pneumoniae infection in vivo. Comprehensive mechanistic studies have revealed that antimicrobial activity proceeds via a multimodal mechanism. Initially, lipid nanoparticles disrupt polysaccharides, then outer and inner membranes are destabilized and destroyed by polymyxin B, resulting in synergistic cell lysis. This novel lipidic nanoparticle system shows tremendous promise as a highly effective antimicrobial treatment targeting multidrug-resistant Gram-negative pathogens.


Assuntos
Nanopartículas , Polimixina B , Polimixina B/farmacologia , Lipossomos/farmacologia , Antibacterianos/farmacologia , Bactérias Gram-Negativas , Klebsiella pneumoniae , Polissacarídeos Bacterianos/farmacologia , Testes de Sensibilidade Microbiana , Farmacorresistência Bacteriana Múltipla
19.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38152979

RESUMO

The identification and characterization of essential genes are central to our understanding of the core biological functions in eukaryotic organisms, and has important implications for the treatment of diseases caused by, for example, cancers and pathogens. Given the major constraints in testing the functions of genes of many organisms in the laboratory, due to the absence of in vitro cultures and/or gene perturbation assays for most metazoan species, there has been a need to develop in silico tools for the accurate prediction or inference of essential genes to underpin systems biological investigations. Major advances in machine learning approaches provide unprecedented opportunities to overcome these limitations and accelerate the discovery of essential genes on a genome-wide scale. Here, we developed and evaluated a large language model- and graph neural network (LLM-GNN)-based approach, called 'Bingo', to predict essential protein-coding genes in the metazoan model organisms Caenorhabditis elegans and Drosophila melanogaster as well as in Mus musculus and Homo sapiens (a HepG2 cell line) by integrating LLM and GNNs with adversarial training. Bingo predicts essential genes under two 'zero-shot' scenarios with transfer learning, showing promise to compensate for a lack of high-quality genomic and proteomic data for non-model organisms. In addition, the attention mechanisms and GNNExplainer were employed to manifest the functional sites and structural domain with most contribution to essentiality. In conclusion, Bingo provides the prospect of being able to accurately infer the essential genes of little- or under-studied organisms of interest, and provides a biological explanation for gene essentiality.


Assuntos
Proteínas de Drosophila , Genes Essenciais , Camundongos , Animais , Proteômica , Drosophila melanogaster/genética , Fluxo de Trabalho , Redes Neurais de Computação , Proteínas/genética , Proteínas dos Microfilamentos/genética , Proteínas de Drosophila/genética
20.
Bioinform Adv ; 3(1): vbad184, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38146538

RESUMO

Motivation: Development of bioinformatics methods is a long, complex and resource-hungry process. Hundreds of these tools were released. While some methods are highly cited and used, many suffer relatively low citation rates. We empirically analyze a large collection of recently released methods in three diverse protein function and disorder prediction areas to identify key factors that contribute to increased citations. Results: We show that provision of a working web server significantly boosts citation rates. On average, methods with working web servers generate three times as many citations compared to tools that are available as only source code, have no code and no server, or are no longer available. This observation holds consistently across different research areas and publication years. We also find that differences in predictive performance are unlikely to impact citation rates. Overall, our empirical results suggest that a relatively low-cost investment into the provision and long-term support of web servers would substantially increase the impact of bioinformatics tools.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...